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Abstract 

Robertson et al.(Reports, July 25 2014, p440-444)(l) claimed that activity-induced 
variability is responsible for the Doppler signal of the proposed planet candidate 
GJ 581d. We point out that their analysis using periodograms of residual data 
is incorrect, further promoting inadequate tools. Since the claim challenges the 
viability of the method to detect exo-Earths, we urge for a correct re-analysis 
(provided as an appendix in pre-print version). 

GJ 58Id was the first planet candidate of a few Earth masses reported 
in the circumstellar habitable zone of another star (1). It was detected by 
measuring the radial velocity variability of its host star using High Accuracy 
Radial Velocity Planet Searcher (HARPS) (1,2). Doppler time series are 
usually modeled as the sum of Keplerian signals plus additional effects (e.g., 
correlations with activity). Detecting a planet candidate consists of quanti¬ 
fying the improvement of a merit statistic when one signal is added to the 
model. Approximate methods are often used to speed up the analyses, such 
as computing periodograms on residual data. Even when models are linear, 
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Fig. 1: This example illustrates why residual statistics must not be used to 
assess significances in multi-parametric fits to data. We want to know 
whether a constant xq is needed to model the position x of a body 
as a function of time t (black dots). Top left panel represents the fit 
to the null hypothesis (model A) which only includes a velocity term 
(red line). Bottom left shows the residuals = x — XAif) to model A. 

An attempt to fit a model with a constant offset to is shown as a 
red line. Top right panel represents a fit adjusting all the parameters 
simultaneously (model B, top right panel, blue line), producing the 
largest possible reduction in y 2 . 


correlations exist between parameters. Similarly, statistics based on residual 
analyses are biased quantities and cannot be used for model comparison. 

A golden rule in data-analysis is that the data should not be corrected, 
but it is our model which needs improvement. The inadequacy of residual 
analyzes can be illustrated using a trivial example (Figure 1). Let’s assume 
16 measurements of the position of an object (x) as a function of time(f) 
and no uncertainties. We are interested in its velocity but we need to decide 
whether constant offset Xo is needed to model the motion. Model A (null hy¬ 
pothesis) consists consists in XA(t) = vt , where v is the only free parameter, 
and the alternative Model B is x_b(£) = Xq + vt . The question is whether 
including x 0 is justified given the improvement of a statistic that we define a 
y 2 = A J^iLi ( x i ~ x{ti)) 2 . Left panel in Figure 1 illustrates an inappropriate 
procedure which consist on adjusting Model A, and then deciding whether a 
constant xq is needed to explain the residuals (bottom left panel). Since such 
residuals are far from a constant shift, the reduction of y 2 is not maximal 
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and the fit to a constant offset unsatisfactory. By subtracting model A from 
the data, we have created a new time-series which is no longer representa¬ 
tive of the original one. A more meaningful procedure consists in comparing 
model A to a the global fit to all the parameters of model B (top right panel) 
achieving maximal improvement of our statistic. 

Similarly, the analysis in Robertson et al. (3) only shows that the signal of 
GJ 58Id is not present in their new residual time-series. Their procedure is 
summarized as follows. Their figures 1 and S3 were used to suggest RV/H a 
correlations contaminating the measurements. After subtracting those cor¬ 
relations and the first three planets, periodograms(4-6) were applied to the 
residuals to show that GJ 58Id fell below the detectability threshold. While 
the signal of GJ 581d is K ~ 1.6 m/s, the apparent variability induced by 
the RV/H q correlations is 5 m/s peak-to-peak, and the scatter around the 
fits is at the 1.5 — 2 m/s level. Subtracting those correlations biased the 
residuals by removing a model that likely included contributions from real 
signals and additional noise was added due to the scatter in the RV/H a rela¬ 
tions. All things considered, the disappearance of GJ 581d in such residual 
data is not surprising. Following Fig. 1, a simultaneous fit of the 30+ param¬ 
eters involved would be needed to reach meaningful conclusions. Although 
there may be substantial RV/H Q correlations, a global optimization analysis 
may not support that GJ 581d is better explained by activity. A complete 
analysis will be presented elsewhere (see Appendix on this preprint version). 

We argue that the results of Robertson et al. (3) come from the improper 
use of periodograms on residual data as they implement the same flawed pro¬ 
cedure illustrated in Figure 1. Despite such periodograms are useful to pro¬ 
vide quick-look analyses, their inadequacy to the task has been abundantly 
discussed in the literature(7-12). Explicitly, derived false alarm probabilities 
would be representative only if a model with one-sinusoid and one offset is 
a sufficient description of the data, measurements are uncorrelated, noise is 
normally distributed, and uncertainties are fully characterized (5). Every 
single of these hypothesis breaks down when dealing with Doppler residuals: 
the number of signals in not known a priori, fits to data correlate residuals, 
and formal uncertainties are never realistic. Proposed alternatives such as 
Monte Carlo bootstrapping of periodograms(5) do not help either, as those 
methods ignore correlations as well. Resulting biases can lead to significance 
assessments off by several orders of magnitude. These issues were irrelevant 
when Doppler amplitudes abundantly exceeded uncertainties. For example, 
an amplitude larger than three times the uncertainties and more than 20 mea¬ 
surements easily leads to false alarm probabilities smaller than 10~ 6 , which 
is much smaller than usual thresholds at 1%-0.1%. For this reason, large 
biases were not problematic in the early detection of gas giants (K~ 50 m/s 
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and a ~ 5 m/s)(13), and it is the main reason why periodograms of residual 
data are still wide-spread tools in Doppler analyses despite their inadequacy 
to the task. 

In summary, analysis of significance using residual data statistics leads 
to incorrect significance assessments. While this has been a common prac¬ 
tice in the past, the problem is now exacerbated with signals closer to the 
noise and increased model complexity. The properties of the noise can be 
included to the model, but never subtracted from the data. This discussion 
directly impacts the viability of the Doppler method to find Earth-like plan¬ 
ets. While Earth causes a 0.1 m/s wobble around the Sun, the long-term 
stability of the most quiet stars seems not better than 0.8 m/s(3). That is, 
activity induced variability can be 5-10 times larger than the signal. While 
global optimization does not provide absolute guarantees of success, analyzes 
based on residual statistics are certainly bound to failure. If activity poses an 
ultimate barrier to the detection of small planets, strategic long-term plans 
concerning large projects will need serious revision(14). It is of capital im¬ 
portance that analysis and verification of multi-planet claims are properly 
done using global-optimization techniques and by acquiring additional ob¬ 
servations. 
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A GJ 581d is not better explained by stellar activity 


We present a re-analysis of the data presented in [Robertson et ali ( 2014 ) 
(R14 hereafter) and show that the main conclusion of that manuscript (’GJ 
581 does not exist’, abstract quoting) is not supported by a gobal fit to the 
data. The analysis is done by adding parameterized effects to the model 
(never subtracting them) and using the maximum likelihood statistic as a 
figure of merit (a generalization of the y 2 statistic). We limit ourselves to 
a frequentist framework, which is sufficient to illustrate the perils of analise 
based on residual statistics. In all that follows, we use exactly the same 
data as provided in R14 to show that the discrepant result comes from basic 
statistical assumptions, and it is not a matter about the quality or properties 
of the data. 

In Section IA.11 we outline the Doppler model used and show how the 
correlations with activity indices are introduced in it. Section IA.2I reviews 
how to produce periodograms that account for the presence of several free 
parameters in addition to the new signal of interest. Results of the analysis 
are given in lA.31 In Section lA.4l we argue that, although the correlations with 
the Ih,, index are substantial, there is no clear evidence for time-variability, 
and highlight the perils of applying arbitrary slicing to datasets and fitting 
unconstrained correlation laws. Concluding remarks are given in Section lA.5l 


A.l Doppler and statistical model 

Our model to predict the radial measured radial velocity v of a star given 
the presence of k-planets is given by 


0]ti = 7/ + 5Z U\ + v(ti - t 0 ) + CiOii 


( 1 ) 


where ti is the instant of each observation, 7/ is a constant offset of each 
instrument / (or dataset), it is a term to account for a long term secular ac¬ 
celeration common to all datasets, and the usual Keplerian parameter of each 
planet candidate p are consolidated in 9 P : Period P p in days, amplitude K p 


in m s 


-1 


, eccentricity e p , argument of periastron uj w in degrees, and initial 
mea n Mo r in degrees. As discussed in R14 a nd already proposed in the past 


(eg. iQueloz et all 120011: iBonfils et all 120071) . some activity indices can lin 


early correlate with spurious Doppler offsets. The rightmost term accounts 
for such correlation and cq is some simultaneaous activity measurement (eg. 
I Ha provided in R14 in this case). Cj can differ between instruments (corre¬ 
lations can be wavelength and resolution dependent), so one C/ is needed for 
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each dataset /. Since the mean value of the activity index is not known, a lin¬ 
ear correlation model should also include a constant offset (eg. v = Cia + bj, 
where bj should be a free parameter as well). However note that such con¬ 
stant is automatically absorbed by 7 /, further motivating the use of different 
7 / for each instrument. All orbital parameter values are given for a given 
reference epoch £ 0) which we arbitrarily assume to be the first observation 
date. 

Concerning the statistical description of the data, and under the same 
assumptions as R14 (white noise, statistically independent measurements), 
we define the logarithm of the likelihood function as 


InL = 


AT 1 N oba 1 N obs ( 7 


t,e\) 


1=1 


i=l 


e? + sj 


( 2 ) 


which is our merit statistic to be maximized. The Sj parameter is often called 
jitter and accounts for extra white-noise of each dataset I. When sj = 0, 
maximizing this log-likelihood function is equivalent to minimizing the y 2 
statistic. When including correlation terms to the data, the jitter parameter 
is even more necessary. That is, the index a also has uncertainties and 
might include variability not traced by the Doppler measurements. As for 
the constant offsets, 7 /, this extra-jitter will be accounted for through sj of 
each dataset. 

R14 proposed to split the Doppler series of GJ 581 in five chunks which 
implicitly assume five independent sets (each one with a possibly different 
7 /, Ci and jitter level si). Details and motivation for such slicing are given in 
R14 and are based on apparent intervals of stronger activity and the natural 
yearly sampling of the data. Consequences of applying this slicing of the 
data are discussed later. 


A.2 Likelihood-ratio periodograms 


A periodogram is a representation of the improvement of some merit statistic 
when a sinusoidal signal is included in the model. Because of its numerical 
efficiency, the most widely spread algorithm to compute periodograms is 
the so-called Lomb-Scarge periodogram (or LS). The LS algorithm adjusts 
one sinusoid to each test period resulting in a plot of the period (x-axis) 
against the improvement of the y 2 statistic. A detail ed d erivation of the 
LS periodogram from y 2 minimization is given in Scarglel f 1982th Under 
the assumptions of the method, the peaks and significances derived from LS 
periodograms are only representative of a test of significance when there are 
no other signals present in the time-series. 
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This problem can be circumvented by realizing that a peri odog ram can 
be use d as a r epresentation of the overall model improvement ( Baluev . 20091 
Anglada-Escude fc TuomiL 2012) by using a merit statistics of the complete 


model. That is, when searching for evidence of a k + 1 periodicity, we also 
need to simultaneously adjust for all the other free parameters. This makes 
such periodograms computationally expensive, and one needs to create spe¬ 
cific implementation of the algorithms instead of using freely distributed 
tools. Periodogram procedures based on adding one signal at a time are 
sometimes called hierarchical methods (detection is done from most signif¬ 
icant to smaller signal). More general methods that directly explore the 
full parameter space exist but will not be discussed here for brevity. Some 


Chain algorithms ( 

Gregory 

, 20ll, Dclaved Rejection Adaptive Metropolis 

(or DRAM 

Tuomi et ah. 2 

014) and Markov Chains with nested sampling 

(Brewer & Donovan. 2015) 



In our analysis, we use our custom made software to perform optimiza¬ 
tion of the likelihood function at the period search level. It differs from other 
Keplerian fitting codes in the sense that allows adjusting correlation coeffi¬ 
cients, jitter parameters, and offsets as free parameters as well (further effects 
can be easily incorporated when necessary). For example, note that the Ci 
coefficients are linear parameters, so they can be trivially incorporated in a 
general least-squares solver. All the parameters in v[6\ ti] are converged using 
regular least-squares solving methods, and the parameters of the likelihood 
(eg. jitter terms s/) are converged using steepest descent steps. This process 
is iterated a few times until a small threshold 5\nL is registered between 
iterations. The solution is finally converged to the local likelihood maxima 
(periodogram peak) using annealing. At the signal search level, the k + 1 
signal is always considered sinusoidal (circular orbit) to avoid problems with 
th e non-linear behaviour of high eccentricities (see discussion in Appendix A 


m 


Anglada-Escude et all 120131) . Given that adding two more parameters(e 


and uj) can only improve the fit to the data, beating a given significance 
threshold for the sinusoid provides a sufficient condition for detection. Sig¬ 
nificance assessments are finally provided using False alarm probability (or 
FAP) estimates as described in Baluev! ( 20091. 2013b ). FAPs smaller than 
1% usually imply significant detections and more accurate significances can 
be later estim ated u sing the in tegration of the Bayesian posterior distribu¬ 
tion (eg. Tuomi fe Jones! . 20121) . Our complete 3 + 1 planet model for five 
datasets contains 32 free parameters: 3x5 Keplerian ones, 3 parameters for 
the k + 1 sinusoid, and 3x5 parameters for the five subsets. The periods 
of the test sinusoids are initialized over 8000 seed values uniformly sampled 
in frequency (1/P) between 1/1.1 and 1/20000 days -1 . The result of this 
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Fig. 2: Likelihood ratio search for a 4th signal demonstrating that the pres¬ 
ence of GJ 581d is still strongly supported by the data, despite the 
correlations with H a . 


procedure is illustrated in Fig. [2j Generating such periodogram took ~ 30 
min on a standard 2.5GHz single-core CPU. More optimal implementations 
of likelihood periodograms are given in Baluev 1 2013afl . 


A.3 Results 

The Doppler time-series and were used as provided in R14. Also as 
in R14, one outlying I# Q measurement was removed form the third dataset 
(JD = 2454610.74293 days, likely caused by a flare). R14 only removes 
correlations on three of the subsets based on apparently higher correlations. 
We allow all five coefficients to be free parameters assuming that they can 
be naturally zero if that value is preferred by the global fit. 

As in R14, the three first signals at periods 5.3686, 12.914 and 3.1490 
days (GJ 581b, GJ 581c, and GJ 581e respectively) are easily detected de¬ 
spite the correlations with I# a . The likelihood-ratio periodogram search for 
the 4-tli signal (GJ 581d) is shown in Fig. [2] (red line). The signal is well 
detected above the 0.1% FAP line, implying a significant detection beyond 
reasonable doubt. The black dots represent our periodogram algorithm ap¬ 
plied to the residuals to the 3-planet + correlations in an attempt to replicate 
R14 analysis more closely. While this procedure shows lower significance, we 
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Fig. 3: Doppler measurements for each subset (chronological order from left 
to right) after subtracting all parameterized effects (planets and cor¬ 
relations) except GJ 581d and phase folded to a period of 66.7 days. 
The red line show the maximum likelihood fit to the orbit. The sig¬ 
nal is well traced in all the subsets (2nd,3rd and 5th) with enough 
observations. 


find that the significance of GJ 581d still does not fall below the 1% FAP 
threshold, suggesting additional relevant differences between our model and 
R14 (eg., jitter is not optimized in R14, and it is unclear whether constant 
offsets for each data chunk were solved as free parameters). In any case, the 
much higher value of the red curve clearly shows that significance is strongly 
boosted (A In L ~ 13 implies ~ e 13 higher significance) when all free param¬ 
eters are adjusted. This feature is characteristic of parameter degeneracy 
(activity signal is similar to the planet candidate’s one), but such partial 
degeneracy alone is not sufficient to negate the significance of a much better 
model on statistical grounds. 


A.4 No evidence for correlations changing with time 


A time variable correlation can be easily explained by unrelated signals in 
both RV and \u a in a simil ar pe rio d do main (correlation does not imply 
causality, see discussion in Velickovic, 2015). Just as an example, Jupiter also 
has an orbital period (11.86 years) comparable to the activity cycle of the Sun 
(~ 11 years). Unless the curves are in perfect phase, the analysis of R14 would 
also detect time-dependent correlations. While skepticism would be natural 
if only one cycle was covered, accumulated observations over several cycles 
would clearly differentiate both signals (unless one keeps adjusting a time- 
dependent correlation on arbitrary data-slices). The Sun-Jupiter example 
can be compared to the period of 66.7 days of GJ 581d to the signal in l Ha 
at ~ 130 days (and harmonics) discussed in R14. A second consequence of 
forcing corrections into the Doppler time-series is the increase in the noise 
floor for the RVs themselves. That is, Doppler time-series become limited by 
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Fig. 4: Doppler versus FR correlation plots for each subset. As for the phased 
plots in Figure [3l all signals except the RJpha/BV correlation have 
been subtracted to improve visualization. The correlation slopes of 
the sets with more observations (2nd and 5th panel) cast serious doubt 
on the proposed time-variability of the correlation law. The values 
of the measured slopes are C\ = —420 m s ' 1 , C 2 = —670 m s -1 , 
C 3 = —1040 m s -1 , C 4 = —190 m s -1 and C 5 = —630 m s -1 , being 
Ci and and C 4 the smallest but also most uncertain given the small 
ammount of observations in those subsets. 

the scatter in the activity indices, and -what is worse- they can be severely 
contaminated by correlated variability of the indices as well. 

While we agree with R14 that correlations of RV/I// a are significant, our 
analysis does not support the reported time-dependence of the correlation 
either. Note that more discrepant slopes in Figure [4] (Panels 1 and 4) corre¬ 
spond to seasons where the number of points and the range of H Q variability 
is much smaller. Given that the model is very complex (30+ parameters) 
and non-linear, proper quantification of the uncertainties in the correlation 
coefficients requires sampling techniques of the posterior density (eg. Monte 
Carlo Markov Chain sampling of the posterior) which is beyond our scope 
here. 


A.5 Conclusions 


The failure to confirm GJ 58Id by R14 seems to be related to the analysis of 
residual data and improper interpretation of periodograms. R14 attempted 
several correlations with activity indices and applied a rather arbitrary slicing 
of the datasets. Furthermore, selecting apparently active sub-sets and fitting 
correlations to those should be avoided as it constitutes a circular argument. 
The same problem likely explains the non-detectio n of the very significant 
signal GJ 667Cd in lRobertson fc Mahadevanl (1201414 . even under the assump¬ 
tion of white noise. In that case, the authors sliced the RV time-series and 
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forced correlations with another activity index (the so-called FWHM). As for 
GJ 581d -and Jupiter for the Sun-, GJ 667C also show evidence for variabil¬ 
ity in periods comparable (Pfwhm ~ 105 days) to the period of the proposed 
planet candidate (P ( i = 91 days). Again, removing time-dependent correla¬ 
tions on data slices necessarily decrease their apparent significance, especially 
in periodograms of residual data. 

The validity of the various models and methods to account for noise 
in Doppler data is a hotly debated topic. Contributions to the discussion 
on benchmark systems should ensure that the applied statistical tools are 
formally correct. Given all these caveats, the analysis presented in R14 should 
be considered inconclusive, and GJ 58Id should be reinstated as a planet 
candidate until additional observations suggest otherwise. 
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